11 research outputs found
Characterizing web pornography consumption from passive measurements
Web pornography represents a large fraction of the Internet traffic, with
thousands of websites and millions of users. Studying web pornography
consumption allows understanding human behaviors and it is crucial for medical
and psychological research. However, given the lack of public data, these works
typically build on surveys, limited by different factors, e.g. unreliable
answers that volunteers may (involuntarily) provide.
In this work, we collect anonymized accesses to pornography websites using
HTTP-level passive traces. Our dataset includes about broadband
subscribers over a period of 3 years. We use it to provide quantitative
information about the interactions of users with pornographic websites,
focusing on time and frequency of use, habits, and trends. We distribute our
anonymized dataset to the community to ease reproducibility and allow further
studies.Comment: Passive and Active Measurements Conference 2019 (PAM 2019). 14 pages,
7 figure
Characterizing Web Pornography Consumption from Passive Measurements
Web pornography represents a large fraction of the Internet traffic, with thousands of websites and millions of users. Studying web pornography consumption allows understanding human behaviors and it is crucial for medical and psychological research.
However, given the lack of public data, these works typically build on surveys, limited by different factors, \eg unreliable answers that volunteers may (involuntarily) provide.
In this work, we collect anonymized accesses to pornography websites using HTTP-level passive traces. Our dataset includes about 15,000 broadband subscribers over a period of 3 years. We use it to provide quantitative information about the interactions of users with pornographic websites, focusing on time and frequency of use, habits, and trends. We distribute our anonymized dataset to the community to ease reproducibility and allow further studies
A Workload Characterization Methodology for WWW Applications
With the World Wide Web (WWW) traffic being the fastest growing portion of load on the internet, describing and characterizing this workload is a central issue for any performance evaluation study. In this paper, we present an approach for generating a profile of requests submitted to a WWW server (GET, POST, ...) which takes explicitly into account the user behavior when surfing the WWW (i.e. navigating through it via a WWW browser). We present Probabilistic Attributed Context Free Grammar (PACFG) as a model for translating from this user oriented view of the workload (namely the conversations made within browser windows) to the methods submitted to the Web servers (respectively to a proxy server). The characterization at this lower level are essential for estimating the traffic on the net and are thus the starting point for evaluations of net traffic
HyperScout: Darstellung erweiterter Typinformationen im World Wide Web — Konzepte und Auswirkungen
Learning Web Request Patterns
Summary. Most requests on the Web are made on behalf of human users, and like other human-computer interactions, the actions of the user can be characterized by identifiable regularities. Much of these patterns of activity, both within a user, and between users, can be identified and exploited by intelligent mechanisms for learning Web request patterns. Our focus is on Markov-based probabilistic techniques, both for their predictive power and their popularity in Web modeling and other domains. Although history-based mechanisms can provide strong performance in predicting future requests, performance can be improved by including predictions from additional sources. In this chapter we review the common approaches to learning and predicting Web request patterns. We provide a consistent description of various algorithms (often independently proposed), and compare performance of those techniques on the same data sets. We also discuss concerns for accurate and realistic evaluation of these techniques.
MixedTrails: Bayesian hypothesis comparison on heterogeneous sequential data
Sequential traces of user data are frequently observed online and offline,
e.g., as sequences of visited websites or as sequences of locations captured by
GPS. However, understanding factors explaining the production of sequence data
is a challenging task, especially since the data generation is often not
homogeneous. For example, navigation behavior might change in different phases
of browsing a website, or movement behavior may vary between groups of users.
In this work, we tackle this task and propose MixedTrails, a Bayesian approach
for comparing the plausibility of hypotheses regarding the generative processes
of heterogeneous sequence data. Each hypothesis is derived from existing
literature, theory or intuition and represents a belief about transition
probabilities between a set of states that can vary between groups of observed
transitions. For example, when trying to understand human movement in a city
and given some observed data, a hypothesis assuming tourists to be more likely
to move towards points of interests than locals, can be shown to be more
plausible than a hypothesis assuming the opposite. Our approach incorporates
such hypotheses as Bayesian priors in a generative mixed transition Markov
chain model, and compares their plausibility utilizing Bayes factors. We
discuss analytical and approximate inference methods for calculating the
marginal likelihoods for Bayes factors, give guidance on interpreting the
results, and illustrate our approach with several experiments on synthetic and
empirical data from Wikipedia and Flickr. Thus, this work enables a novel kind
of analysis for studying sequential data in many application areas.Comment: Published in Data Mining and Knowledge Discovery (2017) and presented
at ECML PKDD 201